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RS is 
ME December 13, 1957. 


Director, 


Dear Sirs 

A number of requests have come from the field for mimeographed 
copies of the Yates lectures and conferences on statistical methods 
given before the Department Graduate School October 28 to 30. In 
the absence of any such record having been made it was believed some 
of the men might be interested in learning what was the general nature 
of the subject matter covered. For this ‘reason, the attached brief 
resume was prepared by Miss Day from her personal notes. As such this 
necessarily must be extremely sketchy as no attempt was made to supple- 
ment the notes taken, 


Very truly yours, 


) d Mey 


I. T. HAIG, 
Acting Chief, Division of Silvics, 
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A BRIEF RESUME OF THE YATES' LECTURES* OCTOBER 28 TO 30, 1937 


Those men who have kept up with writings on modern statistical 





sctures and conferences held at the graduate school, U. S. Dept. of 


Agriculture by Frank Yates, Chief Statistician, Rothamsted Experiment 


Station, Harpenden, England. 





theory would have found vory little that was new in these conferences. 
However, they were most stimulating and worthwhile. One was impress- 
ed with the fact that much of the subject matter Mr. Yates presented 
had been developed by him from actual ficld experience, the necessity 
of finding suitable statistical tools for the solution of practical 


3? tools had been 


= 


problems of the utmost importance, and that these sam 
tested by practical application and found to be good. His approach 
at all times was simple and informal, introducing only the minimum of 
mathematical theory and lonsuage. This was for the benefit of non- 
mathematical persons to whom the particular subject matter was more 
Or Jess unfamiliar, but who were interested in its application to 
their special problems. In the case of the developrent of new designs 
he stated that always the pressure has been from the practical agrono- 
mist and not the theorist. 

Lecture I. Principles underlying the design of factorial (com 
plex) experiments. 

The subject matter of this lecture with the exception of some 
very recent developments presented may be found in Fisher's Design of 


Experinents. 
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Factorial Design, originally called Complex, essentially involres 
the inclusion of more than one factor, very simple and in fact quite of. 


The ordinary simple factorial desicns use 2 or more factors having all 
levels. - 


Figure 1, an experiment on potatoes, was introduced here, 32 plots 
with 4 blocks of sicht plots, all treatments appearing in cach block, 
henes, 4 replications of each treatment. It included all eonbinations of 
5 factors with two levols of each. 


Nitrogen (n) 
Potash (1c) With two levels of each 
Duns (a) 


No treatment called (1) 

Total number of treatrents = 8 fron2x2x22=8 
The treatment combinations indieated on diagram 
The last line shows mean yields 





Consider the interpretation of these mean yields: 
Response to dunc: n and k absent, d- (1) = 8.6 - 2.8 5,8 
a) Presont i sbeont, nd) a n= OA eee pe 
k " n YG ke kk See es = Ba? 
n and k present, Bed nk Seg <= Bek oe 
4/ 20.1 
D e025 
Wean response of dun;, D = 5.025 
Likewise of potash, K = 3.8 > Main effects 
and Nitrocen, N= 0.6 


Precision is high since in each case it is the mean of 16 plots 
tins 16 plots. 


With these 32 plots the same precision iz Sained as if 16 re>plica- 
tions of nitrogen alone for two levels had been uscd. Henee, one advan- 
tage of complex experinents is larse gains in precision. Also we set 
even nore iriportant information the effeets in response when other fac- 
tors arn present. For oxariple : 














5.8 + 6.6 
Dung with absence of potash — or = 6.2 
Zor + 4 
1) 1 Aaa HMRA ON 7) he bal te 
presenee 5 = 328 
Difference in interaction 
of dung and potash = 20% 


This also the nean of 16 plots minus the nean of 16 other plots (This is 
ouly true for 2x 2x 2). 


For convenienee the conventional factor of 1/2 is introduccd, henee 


(Over) 


Then also: 


Main effect with K absent 5.0 = (-1.2) = 6. 
K present 5.0 + (41.2) = 3. 


on 


From the above simple one, more difficult oncs may be introduced. 


Consider the simple physical: problem siven a wighing machine of 
vory fine ordor for which a zero correction is wished, and: seven ogjects 
soparatoly (See Ficure 2). Instead of making ecight weichings if weight 
a is desired it nay be expressed as follows: 


(W, + W, +, + Wy) - (WT +, + WH, + Vg) = 4a 


a: 3 4) 6 7 
Honce, this dodse increases precision 4 tinos. In this case you know 
there are no interactions, } 


In an experiment of factors N and V, with the first” having 3 
levels and the seconc 4, there will be 12 treatments. 


Then the set-up is as follows: 








0 ‘lL 2° ®otels 

as. See ae, Pee x 

b | x ~¢—-Use these to caleulate V 
| VI 
V | 

c Bas 

| 
i etelue a.) eA ee 
Totals ° | x a4 ‘N oa 


Use these .to calculate N 


The inclusion of additional facters increases the number of plots 
rapidly. 


Figure 3isa3x 3x 3 desim showing sets of number, the 27 
treatmonts beins divided.into 3 parts. If constdcring ‘only two factors, 
each block is a-repliention. With a random arrangement on the ground the 
degrees of freedom ara as follows: 


Ne-2 NP = 4 
P-2 NK - 4 NERO" 
K- 2 PK - 4 | 


The treatment NPK will be confounded with blocks. Actually con- 
founded only partially for 2 degrees of freedom. If different corbina- 
tions are used with 4 replications 3/4 of the information for NPK will 
be given, Always possible to do this with 2's and 3's but not with other 
numbers of levels. 





Sone other developments in designs are noted below. 


Lr 
Ficure 4 is a coded system for expe 3rinent of es = ee, where 
the first number represents a latin square of nine, Likewise the second 
figure another latin square. In this desien 81 blocks with information 
on 3 levels of 4 factors. Columns mey be olininated and cet full precis- 


ion of a 9 x 9 latin square. These are called quasi-latin squares. 


Another development is an extension of confounding to non-factor= 
jal designs. ‘With 81 varictics there are 9 groups of 9 varicties but not 
possible to compare varieties in one block with those in others. To over- 
cone this cut across groupings, The precision is p+ 1.10.5. walked 

p*+3 126 
quasi-factorial or lnttice. Arrange by taking rows as one set of blocks 
and columns as other set. 


Tf the simple problem is given to cormare in pairs, as in twins, 
a, dD, ¢, a, ©, the standard method would be 


a with b 
aX nm sec 
Fay tia 3 | 
a ve 


iw 
mo) 


eR of the difference of a - b 


” wy Ott 7) to oe 


i 
Sr 
I 
©) 
1 
Q 
! 
2 


and ” wT " " whole lot would be ‘e () 


Suppose instead all possible pairs wero included. Others are 


b Cc 
b-d aes! 
b C e-e a-e 


Tyo replications of the first cives 5 of the second; same precis- 
jon for each rair. 


' peri -latin s sii 0G Time aid not allow for more than a brief men- 
tion of this tyre of desicn. Mr. Yates ca Wlled attention, however, to & 
new publication, intended as a supplenent to Fishor's "Design of Experi- 
nent", which discusses in sore detail all the foregoing designs and others 
for which there was not sufficient tire in this lecture. This publication 
4s Emperial Bureau of Soil Science Technical Cormunication No, 35, price 
5 shillings. The Design ant Analysis of Factorial Experiments by F Yates. 
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A2x 2x 2 EXPERIMENT ON POTATOES 


Plan and Yields in Lbs. 






Block II 
2291 
et k nk 
324 aie, 506 
nkd Egy | (1) 
449 338 106 


279 


Block Tit 
20569 


Yields of the Different Combinations of Treatments 
(Tons Per Acre). 





b1) n k nk a nd kd nkd 
208 Bed ihe Bel 826 9,4 11 2 
Dung versus no dung Nerns Difference 
n and k absent 8.6 - 2.8 = 5.8) 6.2) 
n present, k absent = 6,6) 
) “Let = 2DXE 
n absent, k present SP es fh) 
n and k present = 4,0) 38 
4/20 1 

Mean response of duns D = 50025) ain Dx K= -1,2)Interaction 
K = 3.8 leProcts Nx K = +002) between 
NFS Olas jon N x D = +0,3)to factors 
Nx Dx K= -Oel 

S.i.08, Sincle plot. 40.50 = 6.2% 


Main effects and interactions = +0.18 


sulphate of armonia 
sulvhate of potas! 
duns 


pS 
ott 


ray) 
If 





Fig. 2. A SCHEMR FOR WEIGHING LIGHT OBJECTS ON A DIAL SCALE 







(Zero connection to be determined) 
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can +e 7.) = dae ) = 4b 
(W, HHL We) er (i a ght st Ist) s 4c 
Wo + am 7 47 47.) = 
ve tl Weed 9) (7 2" 4 We i) = 4d 
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(7 na , Pay He vy) 
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Five 3 COMFOUNDIIG INA 3Sx3x 3 FXPERIVE NT 
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Piftte 4. 
3x 3.x.3 xX 3 QUASI-LATIY SQUARE 


92 





| 
| a | 
| | 


Fach number indicates a treatment eacbination of the four factora. 

The first digit indicatine the corbination. of. the first two factors 
and the seeond dicit the combination of the third and fourth fac- 
tors according to the following schere: ee 


eee eee 








Level of the first factor 


Lovel of the second factor 


All msin effects and interactions betrvexen tyro factors are clear me 
row and colurm effects, honce,: the precision a abtai: 2d on’ these com 
parisons is that of 1.9 x 9 Latin Square. 





Lecture Il. Contrasts between the nethods of correlation and re- 


gressions ty : 
a . = ad ‘ F 


‘hat is neant when it is said variables are correlated? 


“What is ihe ‘underlyiac principles involvine regression and corre- 
lation coefficients and vhat is the difference between then? 


i ae ay . é whe pha 
A diasran with two variables as ¥,| Y, was introduced which showed 
: ~ 


the distribution when hish. values of x G6 with high values of y, this dia- 
gram 4 set*of elliptical; curves. . 


Nes ia rleasuro for such a distribution is the correlation co- 
" Mass ake 2 pe aes, 
“The varianco of x is CG. or V(x) 
vw TS vw Lad ra n 
y Cg V(y) 


Also the covariance of x and y exuressed as Cov (xy) 


een, Govlzy)-— Stx-¥) vey) 
and correlation cocfficient r= sO TSS SO — 
7 V(x) V(7) Sex) “S v-F)" 


This is a ratio and hones unaffected by seule; a special virtue where 
scale has. no: physical meaning. 


Here a figure was introduecd to illustrate the meaning of correla- 

tion. 
Line drawn through mean of every set of vealuos of y for each con- 

stant x. This linc celled the regression of Yon x. Y= a + bx may be 


used in predictins.y civen x. 


If b.is the repression coefficient then ¥ = + b(x-x) 


estimated p = Cov. (xy) 1, pee Ee 
SPO Lae) | x 


If variances are oqual b =r. 

The reeression line passes throucsh points cf vertical tancents. 
Another aspect is the reduction of ae in v by use of rerression 
eceiticient. This reduction in variones in y= ) /i-r all 

A table-of the significance of correlation, r, mai on the nurber of ob- 


servations. 


The sipnificancs of b may be determined siven the standard error of 
be This is easier to comprehend. 


dias 


Usually the variation not the same for y and x as in 


y=a+t bx + cx” + soos, OT Y= f(x). Here correlation coefficient 
breaks Gown - meaning nothing, while regression continues to have meaning. 


The following special case was introduced: A problem involving 
wheat yiold and acreage and it was of course known that zero yield for 
zoro acres. Do not attempt to bring regression linc to this point. Use 
instead the expression "within range of observations." 


Another objection to use of correlation cocfficiont is that errors 
in y will inflate the variance of y, while such errors do not effect b. 
‘There is no :-physical coneept for the correlation coefficient and correc- 
tions mist be made for errors in y. Corrélation coefficient is upset by 
errors in cither x or y or both. a Sr RSET go 

| a ia a 

Selection of independent variables will cause‘ variations in corre- 
lation cocfficiert. Henen, thore is never any justification for computing 
correlation coefficient where x is selected or not entirely random. An 
example was given where this was done with vitiatod results. 


Partial correlation. and regression are what one gets with three or 
more variables. The joint distribution is specificd by" 


Vix), Viv), V(z), Cov (xv), Cov(xz), Cov(yz). 


and Z=¢ * bx + boy or 2 = "tixy) ; 





af 
There are also partial correlation and regression coefficients as 
1Z / iL i- r°yz 
Toxey’ Toy x and by = Tox.y an \y L- pexy » BtCe 


. It is often belicved that it doesn't matter which is used partial 
correlation coefficient or partial regression coefficient. Relations be- 
tween the two is not at all simple. Sam releticn as existed when vari- 
ances were equal for two variables not tuue for partial correlation, One 
should tUse“regression: coefficients rather than correlation coefficients 
and compute their S.E.'s. «vie tt ts yor 


A number of oxamples from seientifiec literature were quoted wherein 
wrong use had beon made of partial correlation coefficicnts. 


Two other important concepts. 


1. Intra-class correlation - better to so over to analysis of 
variance.: 


2 Multiple correlation, 


« - 


Z= by x + bo y correlate observed and predicted, 


o3) ‘ : 
(1 - r”%) portion removed. Hore again Mr. Yates strongly advocated analysis 
of variance to bo the better method, 


B= 


To determine which is more important x or y in relation.to 2 


t q 
in Z = byx + boy. Use 2 = byx and 2 = boy and compare total amount of 


variance removed in each case. If x and y are the same kind of data make 
a direct comparison between by and boe 


An account of the group conferences must of necessity be very gen- 
eral since these were mostly extempore discussions of a number of complex 
experiments now being conducted. Mr. Yates was able to clear up many ob- 
secure points in the minds of some regarding designs and the technique of 
analyses and to offer timely suggestions and warnings of possible pitfalls 
which might vitiate conclusions from such Soa ies tN EN 


A long time rotation wheat experiment on the use of various fertili- 
zers'in process at Rothamsted was described by Mr. Yates. Already valuable 
information is being gained although the experiment has a number of years 
yet to go. 


A regional cotton wilt variety fertilizer study in 12 locations for 

a 3-year poriod involving 12 varieties with three levels of potash and 3 
replications at cach location was discussed at length and the degrees of 
freedom by factors outlined. Mr. Yates called attention to the fact that 
while the pooling of the error term (in this case 70 x 35) from different 
locations resulted in great strength it might not be justifiable. He il- 
lustrated his meaning by an example. Suppose 1é varieties at 6 places 
with 2 replications at each. 


Varicties, V, si es 
Location, L, nee ie interest if L.V is significant 
eo. 55 i< = oe 


‘Blocks, 6 x (2-1) 6 | 
Error , ik ee is: ape ae 


Suppose loeations are random of all locations are va irieties signifi- 
cant answered by comparison of V and Lv? 


Question of testing consisteney of varietal difforences. 
Suppose one variety to be compared with others. 


ne Se 
Then Nis 1. Ts there-a mean differ- 


BC 10 ~~ enee in Vy with others. 
a a 6s a 
LVp Bor \gs) all |equal to error 
Error, 66 2 oie be) 


LV, may be very different from LV, then pooling may not be desirable. 


eetles 


Suppose lst variety is up then anelyze be setting ye sheer 
places and test by t. If Vo is a little different one could take 
Vo me Vpn 3 ote. | 


By comparing each with mean of all othors is a better basis then 
each with the mean of all. 


= id Li 
V Cal —- B ~ Saanmel 
Va Vena) ee rea ee iy 

Precision of coripnarison of one variety with standard where it has 
nore replication is increased over the precision of comparison of two vari- 
eties. Henec, if there should exist a standard it is well to inerease num- 
ber of replication of it. 


Suppose errors are different then test these for significance. It 
is well to start with separate errors and later pool if not significantly 
different. 


Me. Yates described the work being done by the British Cormission 
of Forestry in making a survey of forest resources for which he is now 
acting as Consultant. For such a survey he considered the best solution 
would be to make a random selection of type sections and then take inten- 
Sive grid sample of each of these. For British conditions the circular 


Ti plot sesned adequate with about a 1/2% of total area. 


For cruising a smaller areca, @.g., a county, he recormended that 
it be divided into a nunber of sections and not less than two random plots 
be taken in each such subdividions. An alternate nethod might be the one 
usec in the British Survey and described above, a grid of plots in randor- 
ly selected subdivisions of the county. When the gsrid arrangenent was 
used Mr. Yates intimated that the sampling error rucht better be corputed 
by differences of adjoining plots but did not cive the details of this pro- 
cedure. He advocated that in countries whore forest regions are so varia- 
ble that it would be nost desirable for research to be done in thése 
various forestry conditions to Cetermine what the best rethods are, such 
to involve type of sampling, size and shape of plots and decren of sanp- 
ling necessary for a particular precision. 
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